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The Centre for Information Quality Management (CIQM) 
was set up by the Library Association and UK (United Kingdom) Online 
User Group to act as a clearinghouse to which database users may 
report problems relating to the quality of any aspect of a database 
being used, CIQM acts as an intermediary between the user and 
information provider in obtaining solutions and collects statistics 
on database quality issues which they provide to the information 
industry, CIQM has proposed "Data Labelling" as a means by which 
users can be made aware of database capabilities and limitations. 
Database Labels are short specifications that include a qualitative 
assessment of a database's performance. Labels would be created by 
the information provider and include a complete statement of subject 
coverage, the total number of records, detailed geographic, language 
and time coverage, and simple statements of policy on points such as 
indexing and inclusion. Labels would have a uniform appearance in 
order to distinguish them from other documentation, and would be 
generated regularly, ideally with each product update. If Labels were 
accredited by an impartial agency, their value would be significantly 
enhanced, and Labels would then serve as a guarantee of product 
quality. Ways to implement labeling, implications, and barriers are 
discussed, (SWC) 
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Database Quality 

A mongst a lot of recent talk, articles and papers 
about quality in the information industry, an 
initiative by two professional organisations has 
already gone a long way in helping users cope with 
quality issues and, at the same time, has begun 
looking for a means of providing some security for 
future database users. The Centre for Information 
Quality Management (QQM) was set up by the 
Library Association and the UK Online User Group 
to act as a clearing house to which database users 
may report problems relating to the quality of any 
aspect of a database being used (search software, 
data, indexing, documentation, training). CIQM 
undertakes to forward the problem to the appropri- 
ate body (infonnation provider, online host, CD- 
ROM publisher) and route the response back to the 
user. This activity enables the collection of statistics 
on database quality issues which are fed back into 
the infonnation industry. The service is free to 
users. 

The overall objective of the Centre is to improve 
the quality of databases (online, CD-ROM, diskette, 
tape) and, in so doing, work towards developing a 
set of metrics by which database quality can be 
measured. Fvmding from the British Library 
Research & Development Department has enabled 
the Centre to begin work in this area and the 
remainder of this paper explores one possible 
methodology which offers users guaranteed perfor- 
mance levels for databases. 

Currently, users have no knowledge of the formal 
specification for a database they are using - in 
effect, they are paying for an unknown quantity. 
Added to this, publicity material frequently gener- 
ates unrealistic expectations that are not met when 
searching at the terminal. More reasonable expecta- 
tions - for example, that authority files ate used in 
the generation of primary index fields - are not 
always met either. No database so far evaluated at 
QQM has standardised publisher names; this means 
that users frequently need to search for both ‘John 
Wey’ and ‘Wiley, John’, for example. In one data- 
base the place of publication index contained over 
40 variations on London including mis-speUings, 



concatenated MARC fields, and comments - ‘Lond’, 
‘Londin’, ‘LondonbRoutledge’ (the ‘b’ is the 
remains of the ‘$b’ sub-field marker), ‘London sic’, 
etc. 

Many of the quality issues reported to QQM 
reflect this gap in expectations and there seems to 
be a clear need - as a part of any drive to improve 
database quality - to develop a means by which 
users are made aware of database capabilities. The 
means being investigated at CIQM is Database 
Labelling. 

Database Labelling 

D atabase Labelling was first suggested by P6ter 
Jacsd in a guest editorial in Database as analo- 
gous to food and drug labelling (Jacsd, 1993). 
Database Labels are short specifications which 
include some qualitative assessment of a database’s 
performance. They offer potential users a means 
whereby they can determine exactly what is in a 
database and whether they want to use it: the extent 
to which they can ‘trust’ it. 

The brief current description is supplied or creat- 
ed by the database owner/information provider and 
summarises the more complete and lengthy docu- 
mentation in a way that users would find both easy 
to understand and accessible: a ‘Contents List’ sup- 
plied in a standard, recognisable format One possi- 
ble example is given in Jacsd’s article. 

On the one hand, the Label would supply a data- 
base specification including a complete statement of 
subject coverage (perhaps in the form of a topic 
list), the total number of records, detailed geograph- 
ic, language and time coverage, and simple state- 
ments of policy on such points as indexing and 
inclusion. On the other hand, some measure of these 
might be given by noting the numbers of records 
against years, countries and languages, the average 
numbers of descriptors per record, and percentages 
for information points such as records with 
abstracts. 

Factual information, such as number of records, 
geographical coverage, subject description or avail- 
able fields, is supplemented by qualitative informa- 
tion which qualifies it: thus, geographical coverage 
could include the percentages of records for each 
203 
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country and the list of available fields could include 
the number (or percentage) of records with actual 
data in each of the field types. 

The Label would immediately show exactly what 
a database could do for users, leaving them with no 
imreasonable expectations. The Label would 
become a quality assurance statement demonstrating 
to what extent the database could be relied upon or 
‘trusted’. The factual information would give unam- 
biguous parameters for coverage and use while the 
qualitative metrics would demonstrate how weU the 
database functioned in these areas. 

The Label removes the possibility of unsubstanti- 
ated marketing claims such as, ‘The database has 26 
access points’ (indexes to be used in searching) 
which can no longer disguise the fact that - as has 
often been found - many of the 26 indexes do not 
contain data from every record. If an indexed field 
has only been filled for 80% of the records this wiU 
show on the Label. 

Databases appear on different online hosts or CD- 
ROMs and may have a quite different appearance in 
each version. Different fields may be made avail- 
able (with or without abstracts, for example), the 
indexing is generated by the vendor, print formats 
win almost certainly vary and software-related 
aspects which affect access and ease of use are cer- 
tain to differ. For these reasons. Labels for each 
manifestation of the database wiU have to be gener- 
ated - probably as a joint effort which involves both 
the information provider and the yendor/publisher. 

Labels must have a uniform appearance in order 
to distinguish them from other documentation and a 
standard layout wiU make their use by users and 
prospective users simpler - comparisons can be 
made more easUy. Some form of branding on the 
Label, for example by incorporating the QQM 
logo, might be appropriate as it would mean that 
users could readily identify an independent ‘Label’ 
from other sales or marketing literature from the 
producer. 

Effectively, the Label would become a database- 
specific standard. However, in using the term, ‘stan- 
dard’, care has to be taken to distinguish between a 
Standard as defined by BSI or ISO procedures and 
the idea of an entirely local standard (or level of 
quality) which is specific to a given product. The 
information provider would specify database para- 
meters as they pertain to a database at the point in 
time that the Label is first generated and then seek 
to adhere to or better that performance. 

To be effective, the Label should be generated 
regularly - ideally to coincide with the normal ven- 
dor update cycle - and should be circulated with 



publicity material and made available on exhibition 
stands. It must also be made available to prospective 
users - published - in some form. 

Even as described so far, a Database Label would 
perform a useful function, demonstrating to users 
the exact performance level of any database and act- 
ing as a benchmark against which future perfor- 
mance can be tested by users and producers alike. If 
Labels were accredited by an impartial agency, their 
value would be significantly enhanced. Labelled 
databases would, in effect, have a guarantee of qual- 
ity. The Label would be seen by the user as an inde- 
pendent assessment of the database offering them a 
security hitherto unavailable. 

The Accreditation Body 

A ccreditation by means of the Labels offers 
users a guarantee of quality and producers a 
‘kite mark’ to flag their database as trustworthy. In 
turn, accreditation implies the existence of a neutral 
body which would be responsible for the mecha- 
nism of Label provision, verification and publica- 
tion. 

One of the most apparent problems with 
Labelling is the amount of additional work thrust on 
information providers and vendors. Labels become 
far more viable in terms of the workload if the cen- 
tral body (perhaps QQM in association with the 
Library Association) produces a form to .be filled in 
by producers. 

As has been suggested, aU Labels should look 
identical to the user. Consistency of Labelling is 
desirable but different services and different types 
of data are designed to meet different needs. The 
central body - liaising with database producers, 
hosts and publishers - wiU first need to take respon- 
sibility for developing a format for the Label and for 
producing guidelines as to what information should 
be put against the headings. It would be, in essence, 
a blank form which producers then fill in. It is not 
possible to define a single, standard dataset that can 
be applied to aU databases; each database is differ- 
ent (bibliographic, image or text, for example) so it 
is not practicable to use one form of Label for aU. A 
more pragmatic approach using a standard core of 
headings with options for the producer’s own infor- 
mation or different Labels for different type of data- 
bases, might be more practicable. 

In addition to specifying headings on the form for 
what should be included on the Label - for example, 
the number of records, coverage, fields, indexing, or 
publication years, definitions or ‘scope notes’ ren- 
dering the form easy to complete will be required. It 
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is essential that the task of producing the data for 
the Labels is simplified and automated as far as pos- 
sible so that the information providers and vendors 
are able to supply the information regularly without 
detriment to their database production schedules. It 
may be most convenient for forms to be generated 
and returned electronically. 

Once a database producer and database publisher 
have filled in the ‘form’, it would be submitted to 
CIQM for audit and checking. When they have been 
approved these Labels could then be published 
and/or distributed to users by CIQM or some other 
publishing body. Simplicity is vital if the Label is to 
be of real help to users of a database. After the 
Label has been issued, the database will have to be 
periodically checked against the Label and the 
Label updated to ensure that it continues to accu- 
rately reflect the content and nature of the database. 
Periodically, new Labels will be published. 

The mechanism for publishing the Labels has yet 
to be decided but, apart from making copies avail- 
able to the information owner and the vendor to be 
distributed with documentation and publicity mater- 
ial, a means has to be identified which will make the 
Label readily available to any existing or potential 
user. The Internet may offer the most appropriate 
charmel. Additionally, it is hoped that publishers of 
independent database directories might flag accred- 
ited databases in some way. 

Will Labels Work? 

I n setting out this methodology for database quali- 
ty assurance and in describing the possible advan- 
tages, it is important not to overlook the cost ele- 
ment - which would fall largely to the information 
provider - and other issues of use. 

Labels must provide an accurate picture of a data- 
base as it exists when the Label is created or updat- 
ed. Many of the major and most-used databases 
have been available electronically for 20 or more 
years and in this time have changed considerably. 
New fields may have been added (for example, an 
abstract) or fields may have been divided up to pro- 
vide better access (Source field divided into Journal, 
Publication Year, Volume, Issue, etc, for example); 
thesaural control may have been introduced at some 
point; and coverage will almost certainly have 
improved. To give ‘scores’ representing the entirety 
of the database would give a false or a skewed 
impression of current production. It is not sufficient, 
for example, to show that 80% of the total content is 
from the United States when the average update 
since 1995 is 50% from USA, 20% from the UK, 



with the remaining 30% from continental Europe. 
One solution may be to show the dates of change; 
the date that fields came into existence and their rat- 
ing for use in records from that date only, for exam- 
ple. 

Unlike some publicity material and database fact- 
sheets, the Labels will need to be completely re -pro- 
duced or updated several times each year, this clear- 
ly has considerable overheads in terms of both time 
and costs. Updating such Labels for all of a produc- 
er’s databases in all their various forms would be a 
major task. It will certainly be necessary to date the 
Labels clearly on the front in order that users can 
see clearly that they are using a relevant and current 
version. 

The volume of data to be condensed into a rela- 
tively small amount of space - no more than four A4 
pages - is also problematic. It may be possible to 
balance the short, summary Labels with documenta- 
tion made available electronically - possibly via the 
Internet - with links from individual databases. This 
is already happening to some extent; for example 
SilverPlatter has made available a free database of 
software parameters, hardware specifications and 
database details on their homepage. 

A further consideration is the increasing use of 
databases distributed over local area networks (for 
example, in universities); how are the many users 
(many of them vulnerable end users) to be presented 
with the Labels. Users in any situation cannot be 
made to read the Label but it will be necessary to 
make users aware of the possibilities for quality 
control that are open to them. Local training and 
publicity supplied by library staff can back up 
efforts made by the information providers but the 
most useful tool may well be a logon message ask- 
ing, ‘Have You Read The Label?’ 

The Future 

D atabase Labelling offers considerable benefits 
to users but will require a not inconsiderable 
infrastructure to function. Is it all possible? There is 
a huge backlog of databases to be ‘Labelled’ and a 
feasibility study will be necessary to assess the scale 
of the project. The consensus of opinion at a meet- 
ing of information providers earlier this year was 
that, at the very least, some preliminary research 
should be undertaken. 

Future work at the Centre for Information Quality 
Management will aim to: 

- raise the level of awareness of its aims and 
activities amongst users and the information 
industry 
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- gather more information from users across 
Europe on what they consider to be important 
quality issues as well as on the efficacy of 
Database Labelling 

- develop a design for the Labels and the input 
form (complete with scope notes), and will 

- set up feasibility and pilot studies to look at the 
mechanisms for the various stages of Labelling 
and the costs involved for both an accreditation 
body and the database industry. 

It may be that a part of the infrastructure ultimately 
involves legal requirements to Label databases or it 
may be that Labelling progresses naturally due to 
peer and user pressures. One thing does seem clear: 
if the scheme goes ahead, the unaccredited databas- 
es will tend to lose marketshare to those that are 
accredited while the Labelled databases will be less 
liable to complaints from users - the Labels wiU 
ensure that users have no misconceptions about 
database scope and capabilities at the same time that 
the Label’s benchmarking role gradually drives 
quality up. 
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